home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Almathera Ten Pack 2: CDPD 1
/
Almathera Ten on Ten - Disc 2: CDPD 1.iso
/
pd
/
351-375
/
374
/
mat
/
mat.doc
< prev
next >
Wrap
Text File
|
1995-03-14
|
52KB
|
1,201 lines
============================================================================
|| ||
|| MAT -- p.n.; poss. abbrev. of "Match"; also "Matte" [Motion ||
|| Picture Arts]: means of cutting, inserting and superposing ||
|| disparate items. ||
|| ||
|| ------------------- ||
|| ||
|| This program provides a flexible string-searching, pattern-matching ||
|| and substitution mechanism for both text and filenames. Searching ||
|| for a string in a file is fast (it can be three times as fast as the ||
|| AmigaDOS 'SEARCH'). The (much slower) matching scheme is an extended ||
|| version of the standard AmigaDOS pattern-matching convention, with ||
|| the added features of negation and "slicing" of matched strings. It ||
|| will probably be most useful within command (or ARexx) scripts to ||
|| extend the operations possible with AmigaDOS. ||
|| ||
|| ||
|| ||
|| * Searches for strings or patterns within text files. ||
|| ||
|| * Rearranges text within matched lines to ||
|| create new files. ||
|| ||
|| * Tags and labels can be added to the output stream ||
|| at desired points ||
|| ||
|| * Searches directories for matching file names. ||
|| ||
|| * Creates Command Script Files by inserting the whole or ||
|| parts of matched filenames into text templates. ||
|| ||
|| * Control commands for the program may be read from a ||
|| script file as well as the command line. ||
|| ||
|| ||
|| -- Copyright 1990 Peter J. Goodeve -- ||
============================================================================
-- by Pete Goodeve --
July 1990
Overview
________
This program is intended to fill a number of pattern matching needs
not covered by other facilities. It is very flexible, and consequently
has a number of features you may never use, but it is well suited to those
everyday jobs, too.
It handles both string searches (in which the string will be found anywhere
it occurs on the lines being scanned), and pattern matches that compare
each whole line against a pattern. String searches can be at least three
times as fast as if you used the AmigaDOS 'SEARCH' (disk access time may
well limit the improvement, though). Pattern matches are much slower of
course, but provide much more precise control of the operation. You can
also often get the best of both by "filtering" the text first with a string
search and applying the pattern only to those lines that contain the filter
string.
Patterns are in the usual AmigaDOS format (as used by LIST and so on), with
a number of extensions. You can specify "negative match" segments, whose
appearance will cause a match to fail, and you can -- most importantly --
indicate "slice points" that mark pieces of the matched string to be
rearranged in the output.
To illustrate:
MAT S #include mysrc.c
would print out all lines that contain the string "#include" in the
file "mysrc.c". Note that a) the keyword argument "S" signals that
this is a string search (you can also use the full keyword "SEARCH"
-- upper or lower case); b) the string can be anywhere in the line
(though this particular one will usually be at the beginning);
and c) the character '#' has no special meaning in a string search
(as opposed to being a special character in a pattern).
MAT S #include #?.c
would do the same thing for all files in the current directory ending
in ".c". You can have several file specifications in one command
if you like:
MAT S #include test/#?.c WORK:#?/#?.c
would search all such files in the subdirectory "test" and all
(immediate) subdirectories of assigned device "WORK:".
The preceding examples simply display all lines that fit the criterion,
but you can add a "template" to specify exactly what should be output.
MAT S #include T "^F line ^N: ^O" #?.c
Here, the addition of the template, with its preceding keyword "T"
(or full word "TEMPLATE"), specifies that each matching line should
be displayed in this fashion:
"mysrc.c line 5: #include <stdio.h>"
The marker "^F" (two characters -- the caret '^' followed by upper
case 'F' -- NOT a "control-F") indicates that the current file name
should appear here; "^N" represents the current line number, and
"^O" ("Oh", not "zero") indicates the original current line.
There are a number of other template markers you can use, especially
when you are matching against a "slicing pattern" rather than doing a
string search. You can also specify a template for all the lines that
FAIL to match.
In addition to templates, which apply to every line, you can supply
a "Tag", which applies once per file -- immediately before the first
matched line if there is one, otherwise when the whole file has been
read (giving an alternative message). The format is the same as a
template.
MAT S #include TAG "^F:^| No matches found in ^F" #?.c
will display the filename (once) before all lines matched in that
file, or, if there were no matches, report that fact. The "^|"
pair separates the 'success' and 'failure' portions of the tag
(or template).
Pattern matching can be done on the lines of a file in much the same way.
MAT "#?printf#?,# X,#?" mysrc.c
looks for any lines that do a "printf" on variable "X" (with
arbitrary characters intervening. (Notice that no keyword is
needed here, as this is the default case, but if you prefer you
can use the key "P" or "PAT"; for variant forms, or where there
is ambiguity, a keyword may be required.)
You can speed things up by screening the lines first:
MAT S printf P "#?printf#?,# X,#?" mysrc.c
only does the full match on lines that contain printf. The keyword
"P" is needed here to make your intention clear.
You can add "slice-marks" to a pattern, and rearrange the resulting pieces
with a template. The slice-mark is a single caret character placed at the
desired point. (When you use slice marks, you must also use a template.)
MAT #?^(word|another)^#? "^1: ^0----^2" myfile
Here slice marks are placed either side of the bracketed alternation
in the pattern. In the template,"^0" is the segment of the line matched
by the part of the pattern before the first slice mark, "^1" is the
segment between the two marks (i.e. "word" or "another", depending on
which was matched), and "^2" is the rest of the line. Thus, if the
input line was
"this line contains the word we are looking for"
the output would be
"word: this line contains the ---- we are looking for"
Up to this point, the examples have focussed on scanning text files,
but the same mechanisms can equally well examine directories. For the
simple jobs, of course, one would just do a DIR or LIST, or perhaps a
LIST...LFORMAT.., but Mat can do a few things that the others can't.
For instance, the ability to exclude files by means of a "negative
match" can be very useful. Also the template is rather more flexible
than that of "LIST..LFORMAT (and pre-dates it by about three years!),
among other things allowing specifiers for directory-path separate
from the filename.
For example, for taking quick looks at a bunch of files, I have a
script that does something like this (although in fact it also
pops up a new full-screen window to do it in, and uses a PIPE:):
.K filepat/A
MAT >T:_mm T "more ^P" F <filepat>
execute T:_mm
"^P" represents a complete pathname, and the keyword "F" (or "FILES")
says that the file names (specified in the filepat argument) are to be
used themselves, rather than be scanned.
The preceding example is similar to the "DPAT" script that comes with
1.3, but you can do other things, too. Suppose (and I have actually
been in this situation) that I have a bunch of files that I have
previously renamed to "myfile.c_0", "myfile.o_0", "myfile_0" and so
on, which I now want to restore to their former names by chopping off
th "_0". I have a script that calls Mat in a similar way to the
above, and does just what I want:
REN myfile#?^_0 AS ^0
The first argument to the script is a pattern to select the files
to be renamed, and the second is the template specification of the
new names. Notice the "slice-mark" '^' in the pattern, and the
marker "^0" (meaning the part to the left of the slice) in the
template. Without bothering to detail the script, the core is a
Mat command line that creates appropriate "RENAME" commands to be
executed.
In most cases you will want to have Mat embedded in script files, as this
can drastically simplify the otherwise sometimes complex syntax. Mat, by
the way, returns a "WARN" value (5) to AmigaDOS if no match was found --
useful for conditional execution in scripts.
This has only been a brief sketch of the various modes and so on that the
program has. It is worth bearing in mind though that it has a number of
features -- including the possibility of accepting commands from a script
file or pipe -- that add to its usefulness in things like installation
scripts. These features also let it work well with ARexx (even though it
doesn't itself have an ARexx port).
The following sections go into more detail about Mat's capabilities.
First the command line formats and operating modes are described, then
the keywords are listed in detail. Following this is a full description
of AmigaDOS-type patterns in general, and the extensions used in Mat.
Finally comes a discussion of Templates and File Specifiers.
+ + + + +
Installation and Operation
__________________________
Mat is like any other CLI/Shell command, and is invoked by a command line
with suitable arguments. For normal use it should be available in the
C: directory (or at least on the path of the current shell). For added
speed (under AmigaDOS 1.3 or later) it may be made Resident.
Command Line Arguments
______________________
There are a number of basic command forms corresponding to the various
operations Mat can perform. Each of these in turn can have variants,
and they can often be combined to achieve a particular result. There
are several common basic components, however, which first need clear
understanding.
In the order they normally appear in the command line, from left to
right, they are: <Search-String>, <Pattern>, <Template>, and <Name-Spec>.
Only the last of these is invariably present; the other three MAY all
appear, but quite usually don't. Keywords precede each where necessary
to identify them, and other keywords may be interspersed to give
further control.
<Search-String>: A literal string of characters that will be searched
for in a text file. Any line that contains the string
as a substring is considered to match. No characters in the string
will get special treatment -- there are no "wild-cards" or such.
Matches must be exact, with upper and lower case distinct, unless the
NOCASE keyword is used. Search-Strings are only used in text searches,
not in filename or key matching. A Search-String must be identified by
the required keyword "SEARCH" or its abbreviation "S".
<Pattern>: An AmigaDOS-type pattern specifier string, usually
containing "pattern-structuring" characters with
special meaning. (See the section on "Pattern Matching" later.) It is
always matched against the whole target of concern (a line of text,
filename, or argument). Again, case is relevant unless NOCASE is used.
If a pattern is the first argument on the command line it does not need
a keyword (as it is the default) but if it is elsewhere -- following a
Search-String for example -- it must be preceded by "PAT" or "P". (You
would also have use the keyword in the remotely possible case that the
pattern was itself identical to another keyword.)
<Template>: A string, usually containing "template markers",
that directs the program as to the output it is to
generate for each matching -- and possibly non-matching item. (See the
section on "Templates".) If it is absent, the output depends on the
mode: during a text search each matching target is simply output in its
entirety; when matching filenames or keys there is no output at all
without a template. If a Template string immediately follows an
initial Pattern argument that REQUIRES a template (i.e. it has
slice-marks), no keyword is needed, but it is probably best to use one
anyway; it is of course "TEMPLATE" or just "T".
<Name-Spec>: Either a simple string or a pattern that defines
the entities to be examined. There may be several on
one command line. In the Text-Searching modes, it indicates a file or
group of files to be scanned (referred to as a <File-Spec> below). For
Directory Searching, it will normally be the actual search pattern to
match. It may also represent a "key" (such as a command script
argument) that is to be tested directly against a Pattern. There is no
specific associated keyword, but preceding mode-selection arguments may
determine the meaning. For searching text files, no keyword is usually
needed, as this is the default, but you may occasionally have to supply
one to prevent ambiguity (a file name the same as a keyword, for
example); if you do, the word is "FROM". To select Directory
Searching, you use one of: "FILES" (or "F") -- to search only for files
(not directories) --; "DIRS" ("D") -- to retrieve only directory names
that match --; or "NAMES" ("N") -- to match both. To match the
arguments themselves use "KEY" ("K").
Not all keywords have abbreviations by the way -- only some of the
frequent, non-ambiguous ones. All may be given in upper or lower case,
however. If any of the above items have embedded spaces, they must of
course be enclosed in quotes (which will be stripped off before they
are used).
Operating Modes
_______________
Text Search:
The default function of Mat is to search text files for lines that
match a pattern:
MAT <Pattern> <Name-Spec>...
All lines that match are written to standard output (the screen,
normally). For details on patterns, including the Mat extensions and
the possibilities of "slicing" the matched string for use by a
template, see the section on "Pattern Matching".
To perform a string search rather than a pattern match, the form is:
MAT SEARCH <Search-String> <Name-Spec>...
or
MAT S <Search-String> <Name-Spec>...
You can also combine string search and pattern match:
MAT S <Search-String> P <Pattern> <Name-Spec>...
If both a Search-String and a Pattern are specified in the command
line, each line of the text file will be tested for the presence of the
string first, and only if that succeeds will the pattern be applied.
In situations where this can be done, pattern matching is hardly slower
than the string search alone.
With the above forms, you can control the text output with a template
(see the section on "Templates" for full details):
MAT <Pattern> TEMPLATE <Template> <Name-Spec>...
MAT <Pattern> T <Template> <Name-Spec>...
MAT S <Search-String> T <Template> <Name-Spec>...
In the particular case where a pattern has slice marks (see "Pattern
Matching"), and therefore requires a template, the keyword is optional
-- provided that the template argument is in the correct place:
MAT <Slice-Pattern> <Template> <Name-Spec>...
Directory Search:
When searching directories for matching names, a template is normally
expected:
MAT NAMES <Template> <Name-Spec>...
MAT NAMES T <Template> <Name-Spec>...
MAT N T <Template> <Name-Spec>...
The "TEMPLATE" keyword is optional with this precise form, but required
for others. For example, a separate pattern (in addition to the one in
the Name-Spec) is sometimes useful (to further subdivide the set of
found files for template formatting):
MAT <Pattern> T <Template> NAMES <Name-Spec>...
The template itself however is not optional. If you really want no
template (so nothing will be sent to the output for matches, you must
specify it as a null string; for example:
MAT NAMES "" <Name-Spec>...
The NAMES form locates all matching entries, both file and directory.
The equivalent forms to find just one or the other are:
MAT FILES <Template> <Name-Spec>...
MAT F <Template> <Name-Spec>...
MAT DIRS <Template> <Name-Spec>...
MAT D <Template> <Name-Spec>...
Other Modes:
You can check the arguments in the command line -- probably themselves
supplied as arguments to an execute script -- against a pattern:
MAT <Pattern> KEY <Name-Spec>...
MAT <Pattern> K <Name-Spec>...
Without a template, the KEY form simply checks for a match, and returns
WARN to AmigaDOS if it doesn't find one. No output is generated. Add
a template to get the desired output:
MAT <Pattern> K T <Template> <Name-Spec>...
A final mode simply writes all files that match the Name-Spec to
standard output in sequence. No processing of the files is done --
they needn't even be text:
MAT JOIN <Name-Spec>...
You can however supply a Tag-template (see sections on "Keywords" and
"Templates") that will be output for each file. The 'success' portion
will be written BEFORE the file, the 'fail' part AFTER it; either part
may be omitted. Labels are also valid in a JOIN command line.
Value returned to the CLI:
_________________________
When Mat returns to the CLI or Shell, it passes back a value of zero if
it has found at least one match. If it has found no matches at all it
returns a "WARN" value of 5. This happens in all modes, and can be
tested within a command script to see if the intended operation has
been successful. If you should just want to know if a match exists,
without needing to see any output, you can simply redirect this to
NIL:.
If Mat encounters an error which prevents it from continuing, like an
incorrectly formed pattern, it will return at once with an error code
of 20.
Keywords:
________
Aside from the mode selecting keywords discussed above, there are
a number used to control other features. Some of these take arguments
("SEARCH", "PAT", and "TEMPLATE" being examples we've already met).
In general, keywords can be placed either at the beginning of the line,
or at any appropriate later point, as long as they don't separate a
keyword and its argument. The exact effect may depend on where on the
command line they are placed; in many situations you could have several
interspersed along the line. Mat always processes the command arguments
in sequence, from left to right (unlike the position independent
keywords of AmigaDOS commands). All keywords may be in upper or lower
case. When so specified, they may be abbreviated to their first letter.
To summarize the mode selection keywords already mentioned:
FILES -- searches for matching filenames
DIRS -- searches for matching directory names
NAMES -- searches for both files and directories
KEY -- checks the command line for matches
JOIN -- sequentially copies all matching files to the output
All the above may be abbreviated to their first letter.
You may change modes within a command line if for some reason you need
to. The change takes place at the point the keyword is encountered.
These component specifiers have also been mentioned:
FROM <Name-Spec> -- sets text search mode and specifies
that the following argument is a Name-Spec
(No abbreviation). Rarely needed.
SEARCH <Search-String>
-- declares its argument to be a string
for quick scanning of following text files
(abbreviation "S").
PAT <Pattern> -- specifies that its following argument
is a pattern (abbreviation "P").
TEMPLATE <Template> -- specifies that its following argument
is a template (abbreviation "T"). The
template string may be null ("") to cancel
any previous template, or satisfy the
Directory Search's requirement for one.
Only one Search-String, one Pattern and one Template may be active
at any time. You can change any of them at any point in the command
line, though, replacing the old ones.
There are two other keywords that also take a template form argument:
TAG <Template> -- defines a "tag" (for text file modes only)
that will be output at most once for each file.
The 'success' portion of the template (see the "Templates" section)
will be output immediately before the first match found in the file;
the 'failure' part will be output if no match is found by the end
of the file. Note that this applies even if non-matching lines
are being output (by a 'fail' part of the line template)! Non
matching lines before the first match will also appear BEFORE
the tag. Some of the selectors that are available for a line
template (such as slices) are not so appropriate within a tag
template; they will be ignored if used. Declaring a new tag
replaces any previous one; only one may be active at a time.
LABEL <Template> -- outputs a "label" at the point it is
encountered in the command line. There
are the same restrictions on available template selectors as for
tags. If no "fail" part is supplied, the label will be output
unconditionally; if you do supply a fail part (after "^|"), the
"success" part preceding the divider will only be output if there
has been at least one match to that point, otherwise the fail
part will be output.
No abbreviations are provided for either of the previous or any of the
remaining keywords.
The following subsidiary mode controls can be put anywhere appropriate
on the command line:
NOCASE -- causes all subsequent searches to ignore the case of
pattern and text characters. It can be put anywhere in
the command line subject to the above restrictions; file specifiers
appearing before it will not be affected.
CASE -- cancels the effect of a previous NOCASE.
FIRST -- is only appropriate in text matching modes. It
causes the search of each file after it on the command
line to terminate when the first match is found. It is useful when
you just want to determine which files contain a pattern, rather
than listing every occurrence. It is compatible with templates and
other options.
ALL -- reverses the effect of FIRST if you should need to
do so within a command line.
NOLINES -- prevents the usual newline character being output
after each match. All subsequent matches will be shown
on the same line unless the template dictates otherwise. Don't
forget that you will usually want some sort of separator in the
template, such as a space. It can be used in any mode.
LINE -- reverses the effect of NOLINES if this has been
given previously. (Apologies for the plural/singular
disparity, but it isn't quite the inverse.) It also inserts a
newline into the output at that point; you can use it just for this
if you want an extra blank line between file specifiers.
ZERO -- resets the item and match counters (available for
templates) to zero. Has no other effect on the state.
RESET -- sets the system back to the initialized state:
pattern, template, and tag are cleared, and the
counters are zeroed. Only the internal success flag (that
controls the value returned to AmigaDOS) is left unchanged.
This option is intended for command script use (see below);
you are not likely to need it on a command line.
Two keywords can be used to control input and output channels. Each
takes a single Filename argument (NOT a pattern!):
OUT <File> -- diverts future output to the named file (or
device). Any existing file of that name is deleted.
Any previously selected destination will be closed (except
standard output of course).
OUT - -- (a single dash as the argument) closes any
currently selected output file and restores standard
output (the screen, unless AmigaDOS redirection has also been
used).
WITH <File> -- reads control commands from <File> instead of
the command line. All keywords and argument types
are valid in a script, but NOTHING is reset at the end of a line.
(Don't break a line between a keyword and its argument, though!)
The RESET keyword must be included when you need to clear the
decks. When the end of the script is reached, control returns
to any further arguments on the command line. As with a CLI
command line, anything after a semicolon on a line is ignored:
you may comment your scripts.
+ + + + +
Pattern Matching
________________
The pattern matching algorithm used by Mat is an extension of the
standard file pattern matching scheme used by AmigaDOS. Many people may
not appreciate how general and flexible the method is. It is many times
more capable than the simple "wild-card" matching available on most
personal computers. There are some things that the standard algorithm
doesn't have which would often be useful, and I have done my best to supply
some of these in this extended version.
The discussion that follows may be a fuller exposition of how to use
pattern matching than is available from other sources. If you leave out
references to the "universal-match" character "*", "negation matches", and
"slicing", everything discussed applies just as well to standard AmigaDOS
patterns, which can be used in commands like LIST, DELETE, and COPY.
A pattern is a text string constructed from "plain characters" and
"special characters". It represents a set (possibly a large set) of text
strings that will match it. Remember that it always matches complete
strings; this is not the same as a simple text search, where a match is
signalled if the search string is found anywhere within the source text.
The string being matched by the pattern is always "bounded" in some way,
either because it stands alone -- like a file name -- or because, say, it
is a complete line of text. The newline character at the end is not
usually available to the matching process.
If a pattern argument in a command line contains spaces, it must of
course be enclosed in quotes. There is no way of including quotes in a
pattern which is itself enclosed in quotes, unfortunately, (because of the
way C handles argument strings).
The syntax of the pattern structure is such that complex patterns can
be built from simple ones. Broadly speaking, patterns may be chained end
to end so that successive segments of a complete target string may be
matched by successive segments of the pattern. In addition, each pattern
segment can specify "alternatives": if any of these match, the whole
segment matches.
Plain Characters:
The simplest pattern is a string of plain characters. This will only
match a target string consisting of exactly the same characters in the
same order, which is obviously of limited usefulness. The only case
where you are likely to want this is when getting a particular file
name, and the program is smart enough to go directly to the file in
this case rather than doing a search.
Special Characters:
To build more general patterns we need the special characters. These do
not represent themselves (unless special action is taken): they are
instead structural elements that form the structure of the patterns we
desire. Using them we can build patterns -- or subpatterns -- that will
match, say, any single character, any five characters, any arbitrary
string, or a string that is one of several possible specific
alternatives. We can then put such subpatterns together to end up with
a complete pattern that will match all the various possibilities we are
looking for and no others. The possibilities should become clearer as
we get to specific examples.
The seven special characters used in AmigaDOS file matching are:
' ? | ( ) # and %
To these Mat adds two more:
~ and ^
We'll look at them briefly in order, before we get into a fuller
exploration:
" ' " makes the character following it into a plain character.
" ? " matches ANY single character.
" | " separates alternative patterns.
" ( " and " ) " enclose patterns used in building larger ones.
" # " causes a match to any number of repetitions of the pattern
it precedes.
" % " matches the null string when syntactically necessary.
" ~ " is one way (of two) of sprecifying negation.
" ^ " slices a matched string into segments.
Quoting Characters:
The single quote (" ' ") is used to turn any special character
immediately following it into a plain character. Thus to match against
an actual question mark in a target text you would include the pair
" '? " in the pattern. And of course it can quote itself.
Matching Any Character:
The question mark matches ANY single character. Thus:
???
matches "abc", "xyz", and so on, but not "ab" or "abcd".
Matching Alternatives:
The vertical bar (" | ") separates "alternatives". If any of a set of
patterns separated by bars matches the target, the match is successful.
For example:
abc|def|qwertyuiop
would match any of those three strings, but no others.
The pattern
abc|x?z
would match "abc" or "x" and "z" separated by any single character.
Building Patterns from Others:
The left and right parentheses can be used to enclose a pattern that
you want to match as a unit when it is part of a larger pattern. As one
example we could look for any two characters followed by "abc" or "def"
with the pattern:
??(abc|def)
Combine two or more patterns in sequence this way:
(abc|def)(xxx|yyy)
This will match "abcxxx", "abcyyy", "defxxx", and "defyyy".
Patterns can be nested as far as you like with parentheses:
a(bc|??(xx|yy))d
will match "abcd", or any six-letter group beginning with "a" and
ending in "xxd" or "yyd".
Redundant parentheses do no harm. They may be useful to distinguish
patterns from other constructs.
Pattern Repetition:
The " # " character is always followed by a (sub)pattern. It will match
ANY number of (exact) repetitions of that pattern (INLUDING zero). The
pattern may be a single letter, but if it isn't it must be enclosed in
parentheses. Thus:
#(ab)
matches "ab", "abababab", or simply an empty string. It does not match
"ababa".
Ther pattern to be repeated may be any legal pattern, including more
repetition constructs if you want:
#(ab|?x|#(xy)z)
will match such strings as "abab", "zxab", "qxxyxyxyxyzxyab", and so
on. It will NOT match "abxy".
Matching the Empty String:
The " % " character is used where you have to specify an empty ("null")
string -- normally as one of a number of alternatives. The
construction
(|abc)
is not legal; instead you must use:
(%|abc)
which will match either "abc" or the null string.
Negated Matching:
Mat extends the basic pattern matching syntax by allowing you to
specify patterns that if matched will cause the overall match to fail.
If a negated segment is included in a pattern, and the target string
has ANY POSSIBLE match of the whole pattern that includes that segment,
the match cannot succeed. There are restrictions on negation patterns
not shared by the structures we've talked about up to now; in
particular they can't be nested -- you can't negate a negation --
although they can be inserted at any level in the pattern.
There are two ways of specifying negated patterns. The first will
match ANY string UNLESS it exactly matches the pattern; it is
constructed by prefixing the pattern by the tilde (" ~ "):
ab~(cd)e
will not match "abcde", but will match any other string that begins
with "ab" and ends with "e", such as "abxxxe", "abe", "abce", etc..
The second form is a "negated alternative", indicated by two adjacent
vertical bars (" || "). This is used when, rather than matching ANY
string that is not the negated one, you have a set of patterns you want
to match UNLESS the negated part is also matched. Thus:
a(b?d|?c?||bcd)
will match four character strings such as "abxd", "accc", "abcx", as
long as the whole string is not "abcd".
You can have more than one negated segment, as long as one does not
appear inside another. Thus the following sort of thing is possible
(whether it's also useful though...?):
a~bc~(de)(???||fgh||xyz)
Remember that this will be forced to fail if there is any possible
match that includes a negated section. Thus these will succeed:
acxxx
abbbcddeabc
acdexy
and these will fail:
abcxxx
abbcdexxx
aczxsdefrgthcjxsxcxyz
To stress it once again, a negated match is "aggressive": if there
is ANY possible match that includes a negated section, the whole
match will fail.
Slicing the Matched String:
You can include "slice marks" (the caret -- " ^ ") in your pattern to
select out pieces of the matched string that can be treated individually.
Mat will arrange these "slices" in a manner specified by a template
(next section), to generate desired output.
Once again there is a restriction on the use of this character that
does not apply to the others: only the first four of these marks
encountered during a match will be recorded; any after this will be
ignored. Note that this doesn't mean you can only include a maximum of
four marks; if they are inside alternatives that don't match any part
of the target string, the scan will never encounter them. You should
be sure of what you are doing, though, if you don't want to be
surprised by the program's choices. We'll return to this, and some
other points you should note about the behaviour of slice marks, later.
If there is more than one possible match of the pattern to the target,
the slice will be made at the earliest possible point. Remember this
especially when you have repetitions in your pattern.
Examples:
The pattern #?^x#?
will cut abcdxyz
into abcd xyz
It will also cut abcxxxx
into abc xxxx
The pattern #?^x#?y^#?
will cut abcxxxxyz
into abc xxxxy z
The pattern #?^#x^#?
won't cut much of anything! (because "#x" also matches the null
string.) The first two slices will simply always be empty, and
slice three will contain the whole string.
The pattern #?^(word|another)^#?
will cut "here is another word for you"
into "here is" "another" "word for you"
(using quotes in this case to mark off the slices). Notice that
the cuts are made around "another" rather than "word" because the
earliest match is found.
Slice marks within alternatives can be used, as noted above, but are
tricky. Because of the way the marks are recorded internally, if two
different alternatives containing them match, both marks will be
reported but the position of one of them will be wrong (probably at the
beginning of the string). So it is best to keep the slice marks
outside of any alternation constructions (as shown in the last example
above).
Be careful using negation with slice marks. As noted above, any
match with the negated section causes the whole match to fail: it
does not try to find another match. Therefore you can't use a
negation to force a slice mark to be in a different place. In
general there are some limitations such as these which may prevent
you from cutting up strings exactly the way you want (chopping out
variable spaces can be very awkward for example).
Templates
_________
The Templates Mat uses to generate output lines are basically simple
text strings with "splice-markers" and other "selectors" that indicate
where the pieces of the matched string and other items are to be inserted.
The text segments of a template can be anything you want (except a newline
-- there is a selector for this). A special marker can be used to divide
the template string into "success" and "fail" halves; the "success" part
controls the format of output lines for matches, while the "fail" part will
be output for each input string that doesn't match. Output strings are
always terminated with a newline, unless the NOLINES option is in effect.
You can cancel an existing template by supplying a null string ( "" ). Any
current Tag may be cancelled in the same way. The effect is slightly
different in Directory Search mode: this normally EXPECTS a template to
control the output (and you can't simply omit it), but you can defeat the
requirement by supplying a null template, in which case matches will
produce no output.
Each marker (or selector -- the terms will be used interchangeably) is
a character pair: the caret (" ^ ") followed by a selector character.
Slices from the matched string are numbered -- "^0" to "^4" . Other items
have identifying letters, such as "^N" for line number; the case of these
letters is important (all are currently upper case because you are already
holding down the shift key for the caret). The success/fail divider uses
the vertical bar: "^|".
Not all selectors are valid, or at least have exactly the same meaning,
under all conditions. For example you can't use slices from a matched line
in the "fail" section of a template because -- obviously -- there aren't
any. Then, "line numbers" naturally only apply in text matching, but in
file name matching mode the same selector (^N) keeps track of the number of
files encountered. If you use a selector that is not valid it is simply
skipped over. Of course you can use any selector more than once within a
template.
If a template argument in a command line contains spaces, it must of
course be enclosed in quotes. As with a pattern, you can't include quotes
in a template which is itself enclosed in quotes: use the "^Q" selector
instead. You can include a caret character in the output string (if, for
example you are generating a new Mat command in a script) by the pair "^^".
Slice Selectors:
As four slice marks are allowed in a pattern, there can be a maximum of
five slices of the matched string. These are selected by "^0" for the
piece from the beginning of the string to the first mark, "^1" for the
piece between the first and second, up to "^4" for the remainder of the
string beyond the fourth mark. If there are fewer than four slice
marks, the slice associated with the final existing mark extends to the
end of the string, and all higher-number pieces are empty. Thus if
there are only two marks, "^2" covers the remainder of the string, and
"^3" and "^4" are empty.
For instance, if we use this pattern, with two slice marks:
#?^word ^#?
and this template -- which will omit slice 1:
^0^2
to match and rearrange the string:
"this word will be missing"
we will end up with:
"this will be missing"
Slice marks are only appropriate to match-templates. They are
ignored by 'fail' templates (see below) and in Tags and Labels.
Line Number Selector:
The marker "^N" placed in a template string will insert the current
line Number within the file being scanned at that point into the output
string. It can be used in both "success" and "fail" portions of a
template. In tags and labels it will represent the lines read to that
point. If used in Directory Search or Argument Scanning modes, it
represents the total number of items scanned to that point; it is not
automatically reset in these modes (use ZERO to do so)..
So the pattern #?^(word|another)^#?
and template ^N: ^1
would generate something like 234: another
Index Number Selector:
The pair "^I" inserts an Index number representing a count of matches
so far. The count is kept from the beginning of the program, and is not
reset with a new file (use ZERO to do so). You may use it in the "fail"
section of a template, also in tags and labels, but remember it will
indicate the number of matches, not lines output.
"^I" also works in Directory or Argument Search mode. If no pattern
(aside from the file specifiers themselves) is supplied in these modes,
it will have the same value as "^N", but you can also match the total
set of files found against a Pattern argument, in which case "^I" will
reflect these matches rather than total files.
Original String Selector:
The pair "^O" ("Oh", not "zero" -- I probably should have chosen a
better one...) represents the unsliced Original string. It can be used
in both the "success" and "fail" parts of a template. Thus, to simply
put a line number in front of each matched line, you could use the
template:
^N: ^O
In File Matching mode, this selector is the same as "^F" (below); in
Argument Matching it represents the current argument. It is a null
string for tags and labels in all cases.
Line Break Selector:
The pair "^B" Breaks the output line at that point with a newline
character. For instance, to output line number and slice-1 on one
line, followed by the original string on a new line, you would use:
^N: ^1^B^O
Quote Mark Selector:
It is not usually possible to embed quote marks in template strings
directly, so you can use the selector "^Q" to make them appear at that
point in the output line.
^0 ^Q^1^Q ^2
File Name Selector:
"^F" selects the local name of the current File (i.e without any
directory prefix), in both text and file name matching modes. (Only
in Argument Match is it a null string.)
For example, if you have a filename specifier argument (see later)
:work#?/#?.txt
which has found the file
Work Disk:work_1/sample.txt
the "^F" selector will insert
sample.txt
If the item referenced by the specifier is a Device (such as a PIPE:)
rather than a file, "^F" will return the FULL name as specified
(pattern specifiers will never find devices, anyway). Thus the
specifier:
PIPE:xyz
simply returns:
PIPE:xyz
Warning: if the found object is a directory AND is a root device (e.g.
"DF0" or "DF1:") the Filename is the full name of that disk but WITHOUT
the terminating colon! (but the Full Pathname (below) is correct).
Full Pathname Selector:
"^P" represents the FULL pathname (from the parent device) of the
file. Thus for the immediately preceding example, it would supply:
Work Disk:work_1/sample.txt
Directory Path Selector:
"^D" in a template will insert the full path to the Directory of the
current file. Thus for the above file, ^D would insert:
Work Disk:work_1
If the object found is a Device rather than a file, this is a null
string (but the Full Pathname (^P) will be the same as the Filename
(^F) -- see above). It will also be a null string if the found object
is a directory and the specifier was in the form of a "Device" or
"Parent" reference -- in other words like "xyz:" or "/"; in this case,
the Filename ^F is NOT the full pathname -- just the usual simple name
string.
Literal Caret mark Selector:
"^^" included in a template will generate a single caret mark in the
output.
Failure Template Marker:
A simple template is only applied to strings which have been matched,
and nothing is output when there isn't a match. You can split the
template, however, into two subsections with the special success/fail
division marker "^|". The section preceding this mark is applied for a
successful match just like a simple template; the section following it
is used if the match fails. In the "fail" section, any selectors
desired can be used, except the five slices "^0" - "^4".
A simple use would be to output all lines, whether or not they matched,
but mark or rearrange the matched lines in some way. For example the
following would output them all but put a flag and index number on
each matched line (and corresponding blanks before an umatched one):
MATCH[^I]> ^O^| ^O
A tag may be split with the same "^|" divider, in which case the
success part will be output immediately before the first match as
usual, but the fail part will only appear if there is no match in
the file. (In JOIN mode the behaviour is a little different: both
parts will be output if present, 'success' before, and 'fail' after,
each file.) In addition, the success part may be empty (the tag begins
with the marker); in this particular case, ONLY the fail part will ever
appear -- nothing, not even a newline, is output before the first
match.
This last does not apply to match-templates, where a newline will
be written even when the success part is empty. If you really want no
output on matches, add the NOLINES keyword and place "^B" markers
suitably where you DO want newlines.
File Specifiers
_______________
The arguments in the command line you supply to specify the files that
Mat will examine are really just like those you might give to any AmigaDOS
command, but there are one or two extra features.
For text file searches you will probably most often want to specify a
single file. You do this in the usual way with either the local name of a
file in the same directory, or a path name that includes the chain of
directories needed to reach that file in another. Unlike many other
programs that allow pattern matching in filenames, by the way, Mat is
perfectly happy with specific Device names (non filesystem), such as PIPEs
or the Serial Device.
In place of the simple file name, you can use a pattern to match a group of
files in the same directory. Unlike other AmigaDOS commands this pattern
can employ the extended matching features described above ("*", "~", and
"||"). Slice marks can also be used where they are appropriate (see below).
As is usual in AmigaDOS, case is always ignored when searching for matching
filenames.
You can also use patterns in the directory part of the specification,
in just the same way as in the filename part. (Did you know that you can
also do this in most AmigaDOS commands supporting patterns, such as
DELETE?) All the directories matching that specification will be searched
in turn. However, you cannot split a pattern across directories -- in other
words, a pattern must not include a device or directory separator (":" or
"/"). This means that a given pattern can only match directory names at a
certain "level" in the file hierarchy of the disk. Also you cannot use a
pattern in a device specifier -- these must be simple names. To search
more than one level, or more than one device, you must have more specifier
arguments.
In File Name Search mode, if you don't supply any other pattern, you may
put slice marks in the file name portion of the specifier. You cannot
place them in the directory part. Except in this particular situation
(no main pattern present), the program will ignore slices in the filename.
Examples:
These are valid file specifiers:
myfile.txt
my#?file(.txt|_bak)
df1:work/myfile
:work/myfile
/work/myfile
:(work|old)/my^#?
These are not:
df(0|1):#?/#? -- pattern in device part
df1:/#(work/)myfile -- pattern includes directory separator
:w^#?/my^#? -- slice mark in directory part
+ + + + +
========
Distribution and Copyrights
___________________________
Mat itself and this manual are copyright, but may be freely distributed
without charge. Commercial use is prohibited without the express written
permission of the author.
No fee is asked for the non-commercial use of this program, but if one
day you're feeling generous...
Remarks and Suggestions to:
Peter Goodeve
3012 Deakin Street #D
Berkeley, Calif. 94705
%%%%%%%%%%%%